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Abstract 

Content- centric networking — also known as 
information-centric networking (ICN) — shifts empha- 
sis from hosts and interfaces (as in today's Internet) to 
data. Named data becomes addressable and routable, 
while locations that currently store that data become ir- 
relevant to applications. 

Named Data Networking (NDN) is a large collabora- 
tive research effort that exemplifies the content-centric 
approach to networking. NDN has some innate privacy- 
friendly features, such as lack of source and destina- 
tion addresses on packets. However, as discussed in 
this paper, NDN architecture prompts some privacy con- 
cerns mainly stemming from the semantic richness of 
names. We examine privacy -relevant characteristics of 
NDN and present an initial attempt to achieve communi- 
cation privacy. Specifically, we design an NDN add-on 
tool, called ANDaNA, that borrows a number of features 
from Tor As we demonstrate via experiments, it provides 
comparable anonymity with lower relative overhead. 

1 Introduction 

Although the Internet, as a whole, is a huge global 
success story, it is showing clear signs of age. In the 
1970s, when core ideas underlying today's Internet were 
developed, telephony was the only example of effec- 
tive global-scale communications. Thus, while the com- 
munication solution offered by the Internet's TCP/IP 
suite was unique and ground-breaking, the communica- 
tion paradigm it focused on was similar to that of tele- 
phony: a point-to-point conversation between two en- 
tities. The communication world has changed dramat- 
ically since then and today's Internet has to accommo- 
date: information-intensive services, exabytes of con- 
tent created and consumed daily over the Web as well as 
a menagerie of mobile devices connected to it. To keep 
pace with these changes and move the Internet into the 
future, a number of research efforts to design new Inter- 
net architectures have taken off in the last few years. 

Named-Data Networking (NDN) |f32l| is one such ef- 



fort that exemplifies the content-centric approach 
IZTl l28ll to networking. NDN names content instead of 
locations (i.e., hosts or interfaces) and thus transforms 
content into a first-class entity. NDN also stipulates that 
each piece of content must be signed by its producer. 
This allows decoupling of trust in content from trust in 
the entity that might store and/or disseminate that con- 
tent. These NDN features facilitate automatic caching of 
content to optimize bandwidth use and enable effective 
simultaneous utilization of multiple network interfaces. 

However, NDN introduces certain challenges that 
must be addressed in order for it to be a serious can- 
didate for the future Internet architecture. One major 
argument for a new architecture is the inadequate level 
of security and privacy in today's Internet. We view 
anonymity as being a critical feature in any new network 
architecture. It helps people overcome communication 
restrictions and boundaries as well as evade censorship. 
In addition, some applications (e.g., e-cash or anony- 
mous publishing) can be successfully deployed only if 
the underlying network allows users to hide their iden- 
tity IIT4I . Even if end-users do not care about anonymity 
with respect to services they access, they might still want 
to hide their activities from employers, governments and 
ISPs, since those might censor, misuse or accidentally 
leak sensitive information 1191 . 



Lack of source/destination addresses in NDN helps 
privacy, since NDN packets carry information only 
about what is requested but not who is requesting it. 
However, a closer look reveals that this is insufficient. In 
particular, NDN design introduces three important pri- 
vacy challenges: 

1. Name privacy: NDN content names are incen- 
tivized to be semantically related to the content it- 
self. Similar to HTTP headers, names reveal sig- 
nificantly more information about content than IP 
addresses. Moreover, an observer can easily de- 
termine when two requests refer to the same (even 
encrypted) content. 

2. Content privacy: NDN allows any entity that 
knows a name to retrieve corresponding content. 
Encryption in NDN is used to enforce access con- 
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trol and is not applied to publicly available content. 
Thus, consumers wanting to retrieve public content 
cannot rely on encryption to hide what they access. 

3. Cache privacy: as with current web proxies, net- 
work neighbors may learn about each others' con- 
tent access using timing information to identify 
cache hits. 

4. Signature privacy: since digital signatures in 
NDN content packets are required to be publicly 
verifiable, identity of a content signer may leak sen- 
sitive information. 

In this paper, we attempt to address these challenges. We 
present an initial approach, called ANDaNA that can be 
viewed as an adaptation of onion routing to NDN. Our 
approach is in-line with NDN principles. It is designed 
to take advantage of NDN strengths and work around 
its weaknesses. We optimized ANDaNA for small- to 
medium-size interactive communication - such as web- 
browsing and instant messaging - that are characterized 
by moderate amounts of low-latency traffic ifTDl . 

We provide a security analysis of the proposed ap- 
proach under a realistic adversarial model. Specif- 
ically, we define anonymity and unlinkability under 
this security model and show that these properties hold 
for ANDaNA. Moreover, ANDaNA is secure with fewer 
anonymizing router hops than Tor We prototyped 
ANDaNA and assessed its performance via experiments 
over a network testbed. Results show that ANDaNA in- 
troduces less overhead than Tor, especially, for antici- 
pated traffic patterns. 

We believe that this work is both timely and impor- 
tant. The former - because of the recent surge of in- 
terest in content-centric networking and NDN being a 
good example of this paradigm. (Also, while NDN is 
sufficiently mature to have a functional prototype suit- 
able for experimental use, it is still at an early enough 
stage to be open to change.) The latter - because it rep- 
resents the first attempt to identify and address privacy 
problems in a viable candidate for the future Internet ar- 
chitecture. 

Before discussing details of our approach, we present 
further motivation for this work. 

Wliy NDN? There are multiple efforts to develop new 
content-centric architectures and NDN is only one of 
those. We focus on NDN because it stands out in sev- 
eral aspects. First, it combines some revolutionary ideas 
about content-based routing that have attracted consider- 
able attention from the networking research community. 
Second, it builds upon an open-source code-base called 
CCNx ||T2|, that is led and continuously maintained by 
an industrial research lab (PARC). At the time of this 
writing (summer 2011), NDN is one of the very few 
content-centric architectural proposals with a reasonably 



mature prototype available to the research communityQ 
Third, NDN is one of only four projects selected by NSF 
Future Internet Architectures (FIA) program ||20l . 

On the other hand, NDN is an on-going research 
project and is thus subject to continuous change. How- 
ever, we believe that it represents a good example of 
content-centric networking design and at least some of 
its concepts will influence the future of networking. 
More importantly, ideas, techniques and analysis dis- 
cussed in this paper are not specific, or limited to, NDN; 
they are applicable to a wide range of designs, including 
host-, location- and content-addressable networks. 

Approach. NDN follows the proven design principle 
of IP and claims to be the "thin waist" of the communi- 
cations protocol stack. Thus, pushing security or privacy 
services (that are not critical for all types of communica- 
tion) into this thin waist would contradict its design prin- 
ciple. Consequently, as in the case of IP, we believe that 
privacy tools should run on top of NDN. Looking at pri- 
vacy and anonymity techniques in today's Internet, one 
well-established approach is an overlay anonymization 
network, exemplified by Tor ifTSl . Tor and its relatives 
employ layers of concentric encryption and intermedi- 
ate nodes responsible for peeling off layers as packets 
travel through the overlay. This is commonly referred 
to as onion routing. Our approach falls into roughly the 
same category. However, as we discover and discuss in 
this paper, the task of adapting an anonymization over- 
lay approach to NDN is not as simple as it might initially 
seem. 

Scope. The primary focus of this paper is privacy. 
Security and other features of NDN are taken as given 
without justifying their existence. A number of impor- 
tant NDN-related security topics are out of scope of this 
paper, including: trust management, certification and re- 
vocation of credentials as well as routing security. 

Organization. We start with NDN overview and pri- 
vacy analysis in Section|2] Section[3]summarizes related 
work, followed by the description of ANDaNA in Sec- 
tion|4] Section|5]introduces a formal model for provable 
anonymity and security analysis of ANDaNA. Implemen- 
tation details and performance evaluation results are dis- 
cussed in Section|6] The paper concludes in Section|7] 

2 NDN Overview 

NDN 1321 is a communication architecture based 
on named content]! Rather than addressing content 
by its location, NDN refers to it by name. Content 
name is composed of one or more variable-length com- 
ponents that are opaque to the network. Component 

' We are aware of only two other content-centric architecture pro- 
posals - 1331 and 1361 - that have public prototypes. 

^Note that we use the terms "content" and "data" interchangeably 
throughout this paper. 
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boundaries are explicitly delimited by "/". For ex- 
ample, the name of a CNN news content might be; 
/ndn/cnn/news/2011aug20. Large pieces of con- 
tent can be split into fragments with predictable names: 
fragment 137 of a YouTube video could be named: 
/ ndn/youtube/videos /video-7 4 9 . avi/ 137. 

Since the main abstraction is content, there is no ex- 
plicit notion of "hosts" in NDN. (However, their exis- 
tence is assumed.) Communication adheres to the pull 
model: content is delivered to consumers only upon ex- 
plicit request. A consumer requests content by sending 
an interest packet. If an entity (a router or a host) can 
"satisfy" a given interest, it returns the corresponding 
content packet. Interest and content are the only types 
of packets in NDN. A content packet with name X in 
NDN is never forwarded or routed unless it is preceded 
by an interest for name X0 

When a router receives an interest for name X and 
there are no pending interests for the same name in its 
PIT (Pending Interests Table), it forwards this interest to 
the next hop according to its routing table. For each for- 
warded interest, a router stores some state information, 
including the name in the interest and the interface on 
which it was received. However, if an interest for X ar- 
rives while there is an entry for the same name in the PIT, 
the router collapses the present interest (and any subse- 
quent ones for X) storing only the interface on which it 
was received. When content is returned, the router for- 
wards it out on all interfaces where an interest for X has 
been received and flushes the corresponding PIT entry. 
Note that, since no additional information is needed to 
deliver content, an interest does not carry a source ad- 
dress. More detailed discussion of NDN routing can be 
found in ll27ll . 

In NDN, each network entity can provide content 
caching, which is limited only by resource availabil- 
ity. For popular content, this allows interests to be sat- 
isfied from cached copies distributed over the network, 
thus maximizing resource utilization. NDN deals with 
content authenticity and integrity by making digital sig- 
natures mandatory on all content packets. A signature 
binds content with its name, and provides origin au- 
thentication no matter how or from where it is retrieved. 
NDN calls entities that publish new content producers. 
Whereas, as follows from the above discussion, entities 
that request content are called consumers. (Consumers 
and producers are clearly overlapping sets.) Although 
content signature verification is optional in NDN, a sig- 
nature must be verifiable by any NDN entity. To make 
this possible, content packets carry additional metadata, 

'Strictly speaking, content named X' ^ X can be delivered in 
response to an interest for X but only if X is a prefix of X' . As an 
example, the full name of each content packet contains the hash of that 
content; however, this hash value is usually not known to consumers 
and is typically omitted from interests. 



such as the ID of the content publisher and information 
on locating the public key needed for verification. Pub- 
lic keys are treated as regular content: since all content 
is signed, each public key content is effectively a "cer- 
tificate". NDN does not mandate any particular certifi- 
cation infrastructure, relegating trust management to in- 
dividual applications. 

Private or restricted content in NDN is protected via 
encryption by the content publisher Once content is dis- 
tributed unencrypted, there is no mechanism to apply 
subsequent encryption. Specific applications may pro- 
vide a means to explicitly request encryption of content 
by publishers. However, NDN does not currently allow 
consumers to selectively conceal content corresponding 
to their interests. 

From the privacy perspective, lack of source and des- 
tination addresses in NDN packets is a clear advantage 
over IP. In practice, this means that the adversary that 
eavesdrops on a link close to a content producer can not 
immediately identify the consumer(s) who expressed in- 
terest in that content. Moreover, two features of standard 
NDN routers: (1) content caching and (2) collapsing of 
redundant interests, reduce the utility of eavesdropping 
near a content producer since not all interests for the 
same content reach its producer 

On the other hand, NDN provides no protection 
against an adversary that monitors local activity of a spe- 
cific consumer As most content names are expected to 
be semantically relevant to content itself, interests can 
leak a lot of information about the content they aim to 
retrieve. To mitigate this issue, NDN allows the use 
of "encrypted names", whereby a producer encrypts the 
tail-end (a few components) of a name 123.0 However, 
this simple approach does not provide much privacy: the 
adversary can link multiple interests for the same con- 
tent - or those sharing the same name prefix - issued by 
different consumers. Moreover, an adversary can always 
replay an interest to see what (possibly cached) content 
it returns, even if a name of content is not semantically 
relevant. 

3 Related Work 

The goal of anonymizing tools and techniques is to 
decouple actions from entities that perform them. The 
most basic approach to anonymity is to use a trusted 
anonymizing proxy. A proxy is typically interposed be- 
tween a sender and a receiver in order to hide identity 
of the former from the latter. The Anonymizer [Sl and 
Lucent PersonaUzed Web Assistant 1221 are examples of 
this approach. While relatively efficient, it is susceptible 
to a (local) passive adversary that monitors all proxy ac- 



For example, a name such as: 

/ndn/xerox/parc/Alice/ family /photos / Hawaii might 
be replaced with /ndn/xerox/parc/Alice/encrypted-part 
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tivity. Also, a centralized proxy necessitates centralized 
(global) trust and represents a single point of failure. 

A more sophisticated decentralized approach is used 
in mix networks ifTJl . Typically, a mix network achieves 
anonymity by repeatedly routing a message from one 
proxy to another, such that the message gradually loses 
any relationship with its originator Messages must be 
made unintelligible to potentially untrusted intermediate 
nodes. Chaum's initial proposal ifTSl defines an anony- 
mous email system, wherein a sender envelops a mes- 
sage with several concentric layers of public key encryp- 
tion. The resulting message is then forwarded to a se- 
quence of mix servers, that gradually remove one layer 
of encryption at a time and forward the message to the 
next mix server. 

Subsequent research generally falls into two classes: 
delay-tolerant applications (e.g. email, file sharing) and 
real-time or low-latency applications (e.g. web brows- 
ing, VoIP, SSH). These two classes achieve different 
tradeoffs between performance (in terms of latency and 
bandwidth) and anonymity. For example. Babel 1241 . 
Mixmaster 1301 and Mixminion lfT6l belong to the first 
category. Their goal is to provide anonymity with re- 
spect to the global eavesdropper adversary. Each mix 
introduces spurious traffic and randomized traffic delays 
in order to inhibit correlation between input and out- 
put traffic. However, unpredictable traffic characteris- 
tics and high delays make these techniques unsuitable 
for many applications. 

Low-latency anonymizing networks are at the other 
end of the spectrum. They try to minimize extra latency 
by forwarding traffic as fast as possible. Because of this, 
strategies used in anonymization of delay-tolerant traffic 
- batching (delaying) and re-ordering of traffic in mixes, 
as well as introduction of decoy traffic — are generally 
not applicable. For example, BOl shows how traffic pat- 
terns can be used for de-anonymization in low-latency 
anonymity systems. Notable low-latency tools are sum- 
marized below. 

Crowds l37l is a low-latency anonymizing network 
for HTTP traffic. It differs from traditional mix-based 
approaches as it lacks layered encryption. For each mes- 
sage it receives, an anonymizer probabilistically chooses 
to either forward it to a random next hop within the 
Crowds network or deliver it to its final destination. 
Since messages are not encrypted. Crowds is vulnerable 
to local eavesdroppers and predecessor attacks l43l . 

Morphmix l38l [39l is a fully distributed peer-to- 
peer mix network that uses layered encryption. Unlike 
Crowds, it does not require a lookup service to keep 
track of all participating nodes. Senders selects the first 
anonymizer and each anonymizer along an "anonymous 
tunnel" picks the next hop to dynamically build tunnels. 
Tarzan 1211 is another fully distributed peer-to-peer mix 



network. It builds a universally verifiable set of neigh- 
bors (called mimics) for every node to keep track of 
other other Tarzan participants. Every node selects its 
mimics pseudo-randomly. 

Tor fTSl is the best-known and most-used low-latency 
anonymizing tool. It is based on onion routing and 
layered encryption. Tor uses a central directory to lo- 
cate participating nodes and requires users to build a 
three-hop anonymizing circuit by choosing three ran- 
dom nodes. The first is called the guard, the second 
- the middle, and the third — exit node. Once set up, 
each circuit in Tor lasts about 10 minutes. For better 
performance, bandwidth available to nodes is taken into 
account during circuit establishment and multiple TCP 
connections are multiplexed over one circuit. Commu- 
nication between Tor nodes is secured via SSL. How- 
ever, Tor does not introduce any decoy traffic or random- 
ization to hide traffic patterns. Another anonymization 
tool, I2P ll26l . adopts many ideas of Tor, while using a 
distributed untrusted directory service to keep track of 
its participants. I2P also replaces Tor's circuit-switching 
operation with packet-switching to achieve better load 
balancing and fault-tolerance. 

A consumer privacy technique for Information- 
Centric Networks (ICNs) is proposed in ID. Instead of 
using encryption, it leverages cooperation from content 
producers and requires them to mix sensitive informa- 
tion with so-called "cover" content. This approach re- 
quires producers to cooperate and store a large amount 
of cover traffic. It also does not provide consumer- 
producer unlinkability or protection against malicious 
producers. 

Telex l44l is an alternative to mix networks de- 
signed to evade state-level censorship. It uses stegano- 
graphic techniques to hide messages in SSL handshakes. 
Users connect to innocuous-looking unblocked websites 
through SSL. Sympathetic ISP-s that forward user's traf- 
fic recover hidden messages and deliver them to the in- 
tended destination. While novel, this approach presents 
significant deployment challenges and requires support 
from the network infrastructure. Furthermore, the threat 
model in Telex is quite different from that of the other 
anonymizing tools presented above. Moreover, estab- 
lished TCP fingerprinting techniques can easily detect 
differences between a Telex station and a censored web- 
site. Another analogous technique - called Cirripede 
||25l - was recently proposed. 

4 ANDaNA 

ANDaNA is a onion routing overlay network, built 
on top of NDN, that provides privacy and anonymity 
to consumers. In particular, ANDaNA prevents adver- 
saries from linking consumers with the content they 
are retrieving. Following the terminology introduced 
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in ll37ll . ANDaNA provides beyond suspicio^ degree of 
anonymity to its users. 

ANDaNA uses multiple concentric layers of encryp- 
tion and routes messages from consumers through a 
chain of at least two onion routers. Each router removes 
a layer of encryption and forwards the decrypted mes- 
sages to the next hop. Due to its low-latency focus, 
ANDaNA does not guarantee privacy in presence of a 
global eavesdropper However, since it is geared for a 
world-wide (or at least geographically distributed) net- 
work spanning a multitude of administrative domains, 
the existence of such an adversary is unlikely. For this 
reason, we restrict the adversarial capabilities to eaves- 
dropping on, injecting, removing or modifying mes- 
sages on a subset of available links. An adversary 
can compromise NDN routers and ANDaNA nodes at 
will. Nonetheless, consumers benefit from anonymity as 
long as they use at least one non-compromised ANDaNA 
node. Details of our adversarial model and formal pri- 
vacy guarantees are discussed in Section |5] 

4.1 Design 

We now present two techniques — asymmetric and 
session-based — that provide privacy and anonymity for 
NDN traffic. Traffic is routed through ephemeral cir- 
cuits, that are defined as a pair of distinct anonymizing 
routers (ARs). An AR is a NDN node (e.g. a router or a 
host) that chooses to be part of ANDaNA. An ephemeral 
circuit transports only one (or only a few) encrypted in- 
terest(s). It disappears either when the corresponding 
content gets delivered, or after a short timeout (hence 
"ephemeraF). A timeout interval is needed so that the 
consumer can re-issue the same encrypted interest in 
case of packet loss. We refer to the first AR as entry 
router and the second - as exit router. They must not 
belong to the same administrative domain and must not 
share the same name prefix. Optionally, consumers can 
select ARs according to some parameters, such as adver- 
tised bandwidth, availability or average load. As pointed 
out in ifSl [3TI . there is a well know natural tension be- 
tween non-uniform (i.e. performance-driven) choice of 
routers and anonymity. Consumers should consider this 
when selecting ARs. 

To build an ephemeral circuit, a consumer retrieves 
the list of ARs and corresponding public keys. Although 
we do not mandate any particular technique, a consumer 
can retrieve this list using, e.g., a directory service ITSl 
or a decentralized (peer-to-peer) mechanism. AR pub- 
lic keys can be authenticated using decentralized tech- 
niques (such as web-of-trust 121) or a PKI infrastruc- 
ture 

^For any packet observed by the adversary, an entity is considered 
beyond suspicion if it is as likely to be the sender of this packet as any 
other entity. 

*Note that implicit replication implemented through caching al- 



A prospective AR joins ANDaNA by advertising its 
public key, together with its identity defined as: names- 
pace, organization and public key fingerprint. An AR 
also publishes auxiliary information, such as total band- 
width, average load, and uptime. 

As mentioned earlier, both interest and content pack- 
ets leak information. Even if names in interests are hid- 
den, three components of content packets — signatures, 
names and content itself — contain potentially sensitive 
information. Of course, content producers could sim- 
ply generate a new key-pair to sign each content packet. 
This would be impractical, since high costs of key gen- 
eration and distribution would make it difficult for con- 
sumers to authenticate content. (Note that key-evolving 
schemes [Sj do not help, since verification keys gener- 
ally evolve in a way that is predictable to all parties, in- 
cluding the adversary.) Alternatively, the original con- 
tent signature could be replaced with that generated by 
an AR. However, this would preclude end-to-end con- 
tent verifiability and thus break the NDN trust model. 

For this reason, ANDaNA implements encrypted en- 
capsulation of original content, using two symmetric 
keys securely distributed by the consumer to the ARs 
during setup of the ephemeral circuit. Upon receiving a 
content packet, the exit router encrypts it, together with 
the original (cleartext) name and signature, under the 
first key provided by the consumer. Then, treating the 
ciphertext as payload for a new content packet, the exit 
router signs and sends it to the entry router The latter 
strips this signature and the name and encrypts the re- 
maining ciphertext under the second symmetric key pro- 
vided by the consumer Next, it forwards the ciphertext 
with the original encrypted name and a fresh (its own) 
signature. After decrypting the payload, the consumer 
discards the signature from the entry router and verifies 
the one from the content producer. 

Because decryption is deterministic, an encrypted in- 
terest sent to an AR always produces the same output. 
Since ARs are a public resource, the adversary can use 
them to decrypt previously observed interests. It can 
thus observe the corresponding output and correlate in- 
coming/outgoing interests. This is a well-known attack 
and there are several ways to mitigate it, such as en- 
crypted channels between communicating parties ifTSl 
and mixing (for delay-tolerant traffic) ll24l . However, 
such techniques tend to have significant impact on com- 
putational costs and latency. Instead, we use standard 
NDN features of interest aggregation and caching to pre- 
vent such attacks, as described next. 

In NDN, a router (not just an AR) that receives dupli- 
cate interests collapses them. An interest is considered 
a duplicate, if it arrives while another interest referring 

lows the construction of a directory system with better resilience 
against denial-of-service (DoS) attacks than IP. 
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to the same content has not been satisfied. Also, if the 
original interest has been satisfied and the correspond- 
ing content is still in cache, a new interest requesting 
the same piece of data is satisfied with cached content. 
In this case, the router does not forward any interests. 
Therefore, the adversary must wait for the expiration of 
cached content. 

As part of ANDaNA, the consumer includes its current 
timestamp within each encryption layer ARs reject in- 
terests with timestamps outside a pre-defined time win- 
dow. Thus, consumers need to be loosely synchronized 
with ARs that must reserve at least {rate x window) of 
cache, where rate is the router's wire-rate and window 
is the interval within which interests are accepted. In 
this way, if an interest is received multiple times by an 
AR (e.g. in case of loss of the corresponding data packet 
between the AR and the consumer), the AR is able to 
satisfy it using its cache. 

The encryption algorithm used by consumers to con- 
ceal names in interests must be secure against adap- 
tive chosen ciphertext (CCA) attacks^! CCA-security 
191 implies, among other things, probabilistic encryption 
and non-malleability. The former prevents the adversary 
from determining whether two encrypted interests cor- 
respond to the same unencrypted interest. Whereas, the 
latter implies that the adversary cannot modify interests 
to defeat the mechanism described above. 

We now describe two flavors of anonymization pro- 
tocols: asymmetric and session-based. In order to al- 
low efficient routing of interest packets, the encrypted 
component is encoded at the end of the name with both 
flavors. 

Asymmetric: To issue an interest, a consumer selects 
a pair of ARs and uses their public keys to encrypt the 
interest, as described above and in Algorithm[T] A con- 
sumer also generates two symmetric keys: ki and k-z 
that will be used to encrypt the content packet on the 
way back. We use £pk{ ) and 8k{ ) to denote (CCA- 
secure) public key and symmetric encryption schemes, 
respectively. 

To account for the delay due to extra hops needed 
to reach the second AR (and reduce the number of dis- 
carded interests), a consumer adds half of the estimated 
round trip time (RTT) to the innermost timestamp. Each 
AR removes the outermost encryption layer, as detailed 
in Algorithm|2] Since £pk {-) is CCA-secure, the decryp- 
tion process fails if the ciphertext has been modified in 
transit or was not encrypted under the AR's public key. 
Content corresponding to the encrypted interest is en- 
crypted on the way back, as detailed in Algorithm|3] us- 

' Technically, in order to guai'antee correctness an encryption 
scheme suitable for ANDaNA must also be robust Q]. However, since 
CCA-secure encryption schemes used in practice are also robust, we 
omit this requirement in the rest of the paper. 



Algorithm 1: Encrypted Interest Generation 

input : Interest int; Set of £ ARs and their keys: 

7^ = {(ARj.pfci) I < i < i ,pk, e VK.} 
output: Encrypted interest intpj.^ pj.^ ; symmetric keys fci, k2 
1 : Select (AR^ , pki ) , ( ARj ,pkj) from H 
2: if ARi = ARj or AR; , ARj are from same organization or 

ARi, ARj share the same name prefix then 
3: Go to line 1 
4: end if 

5: fci ^ {0, 1}'= ; k2 <- {0, 1}" 

6: eint = AR2/£'pfc (int | k2 \ curr Jtimestamp + RTT/2) 
7: eint = ARi/fpj.^ (eint | k\ \ curr dimestamp) 
8: Output eint, fci, fc2 



Algorithm 2: AR Handling of Encrypted Interests 

input : Encrypted Interest intpfe^ pfe^ , where 

pki,pkj S P/C U {i} (where "±" denotes "no 
encryption") 
output: Interest intpj.^ ; symmetric key ki 
1: {\ntp^..,ki, timestamp) = 2?sfe; (intpfc.^pfc^. ) 
2: if Step 1 fails or timestamp is not current then 
3: Discard intpfe._pfc^ 
4: else 

5: Save tuple (intp^^ pj;^ , intpj;^ , fei) to internal state 
6: Output intpjj^ , fci 
7: end if 



Algorithm 3: AR Content Routing 

input : Content: dataf^^ in response to intp^ . , where 

pkj e P/c u {±} 

output: Encrypted data packet dataj^^ 
1: Retrieve tuple (intpj;^ pj.^ , intp^^ , fci) from internal state 

where name in intpfc^ matches that in data^^ 
2: if fc2 7^ -L then Remove signature and name from data^^ 
3: Create new empty data packet pkt 
4: Set name on pkt as the name on intp^;^ pj.^. 

5: Set the data in pkt as Sj.^ (data^^ ) 
6: Sign pkt with AR's key 
7: Output pkt as data^-^ 



ing £k (•) and symmetric keys supplied by the consumer 

Session-based Variant. This variant aims to reduce 
(amortize) the use of public key encryption thus lower- 
ing the computational cost and ciphertext size. Before 
sending any interests through ephemeral circuits, a con- 
sumer (Alice) establishes a shared secret key with each 
selected AR. This is done via a 2-packet interest/content 
handshake. We do not describe the details of symmet- 
ric key setup, since there are standard ways of doing 
it. We provide two options: one using Diffie-Hellman 
key exchange fTTI, and the other - using SSL/TLS-style 
protocol whereby Alice encrypts a key for ARi . Once a 
symmetric key kai is shared with ARi, Alice can estab- 
lish any number of ephemeral circuits using it as either 
first or second AR hop. Also at setup time, Alice and 
ARi agree on session identifier value - sidai - that is in- 
cluded (in cleartext) in subsequent interests so that ARi 
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can identify the appropriate entry for Alice and kai- 

The main advantage of the session-based approach 
is better performance: both consumers and routers only 
perform symmetric operations after initial key setup. 
However, one drawback is that, since the session iden- 
tifier sid is not encrypted, packets corresponding to the 
same sid are easily linkable. 

We note that our design neither encourages nor pre- 
vents consumers from mixing asymmetric and session- 
based variants for the same or different ephemeral cir- 
cuits. 

4.2 System and Security Model 

In order for our discussion to relate to prior work, 
we use the notion of "indistinguishable configurations" 
from the framework introduced in 119!]; the actual defi- 
nitions are in Section |5] 

Our security analysis considers the worst case sce- 
nario, i.e., interests being satisfied by the content pro- 
ducer rather than a router's cache. While, in normal 
conditions, encrypted interests are satisfied by caches 
only in case of packet loss, fully decrypted interests may 
not have to reach to content producers. A system se- 
cure in case of cache misses is also secure when interests 
are satisfied by content cached at routers along the way. 
(Recall that, when an interest is satisfied by a router's 
cache, it is not forwarded any further.) This limits the 
adversary's ability to observe interests in transit. 

Adversary Goals and Capabilities. The goal of an 
adversary is to link consumers with their actions. In par- 
ticular, it may want to determine what content is being 
requested by a particular user and/or which users are re- 
questing specific content. A somewhat related goal is 
determining which cache (if any) is satisfying a con- 
sumer's requests. Our adversary is local and active: it 
controls only a subset of network entities and can per- 
form any action usually allowed to such entities. More- 
over, it is capable of selectively compromising addi- 
tional network entities according to its local information. 

Our model allows the adversary to perform the fol- 
lowing actions: 

• Deploy compromised routers: ANDaNA is an 
open network, therefore an adversary can deploy 
compromised anonymizers and regular routers. As 
such, routers may exhibit malicious behavior in- 
cluding injection, delay, alteration, or drop traffic. 

• Compromise existing routers: An adversary can 
select any router (either ARs or regular routers) in 
the network and compromise it. As a result, the ad- 
versary learns all the private information (e.g. de- 
cryption keys, pending decrypted interests, cache 
content, etc.) of such router. 

• Control content producers: Content producers 
are not part of ANDaNA. As such, the network has 



no control over them. An adversary can compro- 
mise existing content producers or deploy compro- 
mised ones and convince users to pull content from 
them. We also assume that the content providers 
are publicly accessible, and therefore the adversary 
is able to retrieve content from them. 

• Deploy compromised caches: Similarly to com- 
promised content producers, an adversary can com- 
promise routers' cache or deploy its own caches. 
The behavior of a compromised cache includes 
monitoring cache requests and replying with cor- 
rupted data. 

• Observe and replay traffic: An adversary can tap 
a link carrying anonymized traffic. By doing this 
it learns, among other things, packet contents and 
traffic patterns. The traffic observed by an adver- 
sary can be replayed by any compromised router 

An adversary can iteratively compromise entities of its 
choice, and use the information it gathers to determine 
what should be compromised next. In order to make 
our model realistic, the time required by an adversary 
to compromise or deploy a router, a cache or a con- 
tent producer is significantly higher that the round-trip 
time (RTT) of an anonymized interest and correspond- 
ing data. This implies that all the state information re- 
covered from a newly compromised router only refers to 
packets received after the adversary decides to compro- 
mise such router 

A powerful class of attacks against anonymizing net- 
works is called fingerprinting 1291 l4ll . Inter-packet 
time intervals are usually not hidden in low latency 
onion routing networks because packets are dispatched 
as quickly as possible. This behavior can be exploited 
by an adversary, who can correlate inter-packet intervals 
on two links and use this information to determine if 
the observed packets belong to the same consumer pTl. 
This class of attacks is significantly harder to execute on 
ANDaNA because of the nature of ephemeral circuits and 
because of the use of caches on routers. Ephemeral cir- 
cuits do not allow the adversary to gather enough pack- 
ets with uniform delays since they are used to transport 
only one or a very small number of interests and corre- 
sponding data. Active adversaries who can control the 
communication link of a content provider can add mea- 
surable delays to some of the packets in order to identify 
consumers. However, consumers may be able to retrieve 
the same content through caches making such attack in- 
effective. Throughput fingerprinting consists in measur- 
ing the throughput of the circuit used by a consumer to 
identify the slowest anonymizer in the consumer's cir- 
cuit ||29 l. Throughput fingerprinting is difficult to per- 
form in ANDaNA since each ephemeral circuit does not 
carry enough information to mount an attack. In par- 
ticular, the authors of ||29l report that a successful at- 
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tack requires at least a few minutes of traffic on Tor 
Similarly, ephemeral circuits provide an effective pro- 
tection against known attacks such as the predecessor 
attack m. 

Consumers, Producers and ARs. Each consumer runs 
several processes that generate interests. For our analy- 
sis, interests are created by a specific interface of a host, 
and the corresponding content is delivered back to the 
same interface. Interest encryption is either performed 
on the consumer's host, or on an entity that routes con- 
sumer's traffic. In the latter case, the channel between 
the user and the anonymizing entity is considered se- 
cure. 

Content is generated by producers, i.e., entities that 
can sign data. We do not assume the correspondence 
between a producer and a particular host. Content can 
be either stored in routers' caches, at servers or dynami- 
cally generated in response to an interest. 

ARs perform interests decryption and content encap- 
sulation. Each AR advertises a public key for signature 
verification and one or more public keys for encryption. 
ARs must refresh their encryption keys frequently, dis- 
carding old keys after a short grace period. In order to 
simplify key distribution and allow consumer to imme- 
diately trust new public keys from routers, we use a sim- 
ple key hierarchy where a long lived public key owned 
by the router (the signing key), is used to certify short 
lived encryption keys. The signing key may be certified 
by other entities using techniques like web-of-trust or 
PKI. 

Denial-of-service Attacks. ANDaNA is envisioned as a 
public overlay network and is clearly susceptible to DoS 
attacks. Since anyone can join ANDaNA as an AR or 
use it as a consumer, we make no distinction between 
insider and outsider attacks. The adversary can send nu- 
merous interests to ARs or construct ephemeral circuits 
longer than two hops in order to maximize effective- 
ness of attacks. Moreover, it can consume AR resources 
by sending malformed encrypted interests that require 
ARs to perform expensive and ultimately useless public 
key decryption. Similar to Tor, before establishing an 
ephemeral circuit, an AR can ask a consumer to solve an 
easy-to-verify/expensive-to-solve puzzle. This and sim- 
ilar techniques for ANDaNA are subjects of future work. 
In a setting with long-lived circuits, such as Tor, disrupt- 
ing a node effectively shuts down all circuits that include 
it. Due to the short lifespan of our ephemeral circuits, 
the same attack on ANDaNA only causes a very small 
number of interests/data packets per user to be dropped. 

Abuse. Similar to any other anonymity service, 
ANDaNA can be abused for a variety of nefarious pur- 
poses. We do not elaborate on this topic. However, exit 
policies similar to those in Tor ifTSl can be used with 
ANDaNA based on content names. 



5 Security Analysis 

In this section we propose a formal model for eval- 
uating the security of ANDaNA. We define consumer 
anonymity and unlinkability with respect to an adver- 
sary within this model. We finally provide necessary 
and sufficient conditions for anonymity and unlinkabil- 
ity. As our analysis shows, we are able to obtain a level 
of anonymity comparable to Tor with two — rather than 
Tor's three — ARs thanks to the lack of source addresses 
in NDN interests. 

In general, efficacy of ANDaNA depends on the in- 
ability of the adversary to correlate input and output 
of a non-compromised AR, and its inability to observe 
all producer and consumers at the same time. Since 
ANDaNA is designed for low-latency traffic, we do not 
intentionally delay messages or introduce dummy pack- 
ets, other than some limited padding. This is similar to 
how Tor and other low-latency anonymizing networks 
forward traffic, and implies that traffic patterns remain 
almost unchanged as they pass through the network |f3T|. 
It is well known that, in Tor, this allows the adversary 
that observes both ends of a communication flow to con- 
firm a suspected link between them l5l [35l . For this rea- 
son, a global passive adversary can violate anonymity 
properties of both Tor and ANDaNA. However, we be- 
lieve that such an adversary is unrealistic in a geographi- 
cally distributed network spanning over multiple admin- 
istrative domains, and designing against it would result 
in overkill. 

We assume that any adversary monitoring all inter- 
faces of an AR can correlate entering encrypted traffic 
with its exiting, decrypted counterpart using timing in- 
formation. However, we believe that the short lifespan 
of ephemeral circuits - and therefore the limited num- 
ber of related packets traveling through a single AR - 
severely limits the adversary's ability to carry out this 
attack. Unfortunately, at the time of this writing we 
do not have enough experimental evidence to confirm 
this. For the sake of safety, in the analysis below we 
assume that, by compromising all interfaces of an AR, 
the adversary also compromises the AR itself. There- 
fore, a non-compromised AR must have at least one non- 
compromised interface. To sum up, we assume that: 

Assumption 5.1. Adv cannot correlate input and out- 
put of a non-compromised AR. 

Our analysis is based on indistinguishable configura- 
tions. A configuration defines consumers' activity with 
respect to a particular network. Adv only controls a sub- 
set of network entities and observes only some pack- 
ets. Therefore, it cannot distinguish between two con- 
figurations that vary only in the activity that it cannot 
directly observe or in the content of encrypted pack- 
ets that it cannot decrypt. In order to provide mean- 
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ingful anonymity guarantees, we identify a set of con- 
figurations that have one or more equivalent counter- 
parts. However, unlike |fT9l . our analysis takes into ac- 
count the infrastructure underlying ANDaNA, i.e., the 
network topology and packets exchanged over the ac- 
tual network. We believe that this makes our model and 
analysis both realistic and fine-grained, since it accounts 
for all adversarial advantages related to the underlying 
network structure. Packets sent by a non-compromised 
consumer u to a non-compromised AR r transit through 
several — possibly compromised — NDN routers that 
are not part of ANDaNA. The model of |fT9l considers r 
compromised even if only one link between u and r is 
controlled by the adversary. Whereas, in our model, r is 
considered to be non-compromised. 

Notation and Definitions 

Table [T] summarizes our notation. The intersection of 
P and C might not be empty, which reflects the fact that 
consumers can act as producers and vice versa. Sim- 
ilarly, our model does not prevent routers from being 
producers and/or consumers. Therefore, RnP and RnC 
might be non-empty. 

The adversary is defined as a 4-tuple: Adv = 
(PAdi,, CAd«, ^Adv, \^Adv) C (P, C, R, IF) where indi- 
vidual components specify (respectively) sets of: com- 
promised producers, consumers, routers and interfaces. 
If r G ^Adv, then Adv controls all interfaces and has ac- 
cess to all decryption key and state information of r. If 
all interfaces of r are in ^Adv, then r G ^Adv In other 
words, for the sake of this analysis, controlling all in- 
terfaces of a router is equivalent to learning that router's 
decryption/secret key. We emphasize that for r G R to 
be non-compromised, at least one of its interfaces must 
be non-compromised. If p G PAdv, Adv controls p's 
interfaces, monitors interests received by p and controls 
both content and timing of p's responses to incoming in- 
terests. If c G CAdv, then Adv controls all fields and 
timing of interests. Finally, if if G \^Adv, then Adv can 
listen to all traffic flowing through if, as well as send- 
ing new traffic from it. I ^Adv includes all the interfaces 
of compromised consumers, producers and routers plus 
additional interfaces eavesdropped on by Adv. 

For ease of notation, we do not explicitly indicate the 
name of the next router in interest packets nor symmetric 
keys chosen by consumers. We denote encrypted inter- 
ests as: 

intpfcj^pfe^ = fpfci(£pfc3(int)) 

with pki,pk2 G VIC U {±} where 1. indicates a special 
symbol for "no encryption". If pki = _L then pk2 = ^. 
The size of public keys is a function of the global se- 
curity parameter k. For simplicity, we denote intp^.j.^ 
as intpfe^. When an AR receives intp^^ pj;^ and if it is in 
possession of the decryption key corresponding to pfci, it 



removes the outer layer of encryption. While £ is CCA- 
secure (and therefore also CPA-secure), we do not re- 
quire £ to be key private Q. Key privacy prevents an 
observer from learning the public key used to generate 
a ciphertext. In ANDaNA, knowledge of the public key 
used to encrypt the outer layer of an interest does not re- 
veal any more information than the (cleartext) name on 
the interest. 

We define the anonymity set with respect to interface 
if[ as: 

A;f^ = {d I Pr [d ^int r I int ^ if^] > 0} 

In other words, for each interface if^ of router r, A\fr 
contains all entities that could have sent int with non- 
zero probability. We define path'"' = {if[ | int ^ if[}. 
This is the sequence of interfaces traversed by int. We 
use it to define the anonymity set of an interest with re- 
spect Adv: 

■^'Adv = Pi ^ifi 

pathi"tnlFAd„ 

Intuitively, if u is far away from a compromised entity 
d, then all sets A'^^i^ such that u G A'2]i^, are a large 
subset of C. Adv can rule out possible senders of an in- 
terest (i.e., determine if u ^ ^Adu) only if it controls 
at least one entity (routers, interfaces) along each path 
that u does not share with other consumers. The level of 
anonymity of m G with respect to Adv is propor- 

tional to the size of Aj^^^ . In particular, if u is the only 
member of A^^^, it has no anonymity, since int must 
have been issued by u. 

A configuration is a description of the network activ- 
ity. Each configuration maps consumers to their actions, 
defined as the interest they issue and the corresponding 
content producers. More formally, a configuration is a 
relation: 

C : C ^ {(ri,r2,p, intpfcj^pfcj} 

with(ri,r2,_p, intpfej^pfej G R^ x P x {0, 1}*, that maps 
a consumer to: a pair of routers defining an ephemeral 
circuit, an interest (encrypted for this circuit) and a pro- 
ducer. C{u) is a 4-tuple that represents one action of u 
in C. Ci is the selection on the i-th component of C, 
i.e., if C{u) = (ri,r2,_p, intpfci^pfcj, then Ci{u) ri, 
C2{u) = r2, Caiu) = p and C4(w) = intpfc^^pfc^. 

We say that two configurations C and C" are "indis- 
tinguishable with respect to Adv" if Adv can only de- 
termine with probability at most 1/2 + e which config- 
uration corresponds to the observed network, for some 
e negligible in the security parameter k. We denote two 
such configurations as C =Adv C . 

We now show that assumption 15.11 holds if a pas- 
sive adversary observes only input and output values of 
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c 


set of all consumers, « £ C 


Adv 


adversary 


p 


set of all content producers, p G P 


d 


an entity, i.e., a router or a host 


R 


set of all routers, r G R 


d --s-int r 


entity d sends interest int to some interface of router r 


IF 


set of all interfaces on all network devices 


int if[ 


router r receives interest int on interface if^ 


if[ e IF 


i-th interface on router r 




CCA-secure hybrid encryption scheme 


VK 


set of all public keys 




interest encrypted under public keys pki,pk2 


(pki, ski) 


public/priv. encryption keypair of an AR 


± 


no encryption 



Table 1. Notation. 



an AR (i.e., it cannot use timing information or other 
side-channels), and the underlying encryption scheme 
is semantically secure. Claim ISTI below states that, for 
any encrypted interest, Adv cannot determine if it corre- 
sponds to an interest decrypted by a non-compromised 
router, by observing the two and with no additional in- 
formation. 

Claim 5.1. Given any CPA-secure public key encryp- 
tion scheme £ and two same-length interests int*^, int^ 
chosen by Adv, Adv has only negligible advantage over 
1/2 in determining the value of a randomly selected bit 

b, given \ntlk,,pk,' intpfc^ """^ '^"^lk2' with pki G PIC 
andpk2 G T'/C U {_L}. 

Due to the lack of space, Claim |5?T] is formally justified 
in Appendix lAl 

Anonymity Definitions and Conditions 

In this section we present formal definitions of 
anonymity for our model. We introduce the notions of 
consumer anonymity, producer anonymity and producer 
and consumer unlinkability. We show that ephemeral 
circuits composed of two anonymizing routers — at 
least one of which is not compromised — provide con- 
sumer and producer anonymity. This, in turn, implies 
consumer and producer unlinkability. Due to the lack 
of space, we defer formal proofs of the theorems in this 
section to AppendixlAl 

A consumer u enjoys consumer anonymity if Adv 
cannot determine whether u or a different user u' is 
retrieving some specific content. This notion is for- 
malized using indistinguishable configurations: given a 
configuration C in which u retrieves content t, u has 
consumer anonymity if there exist another configuration 
C" in which u' retrieves t and Adv cannot determine 
whether he is observing C or C . More formally: 

Definition 5.1 (Consumer anonymity), u G (C \ CAdv) 
has consumer anonymity in configuration C with respect 
to Adv if there exists C =Adv C such that C'{u') = 
C{u) and u' ^ u. 

Tlieorem5.1. u G {C\C Adv) has consumer anonymity 
in C with respect to Adv if there exists u' ^ u such that 
any of the following conditions hold: 

1. U,U 



2. Ci{u) = Ci{u'), Ci{u) ^ RAdv and Ci{u) G 

"^'Idv^ w/iere C4(m) = intpfci.pfc^ 

3. C2(u) = C2(u'), C'2(u) ^ RAdv and C2(m) G 
AXdv where C4{u) = mtpk^^pk^ 

Informally, the theorem above states that ANDaNA pro- 
vides consumer anonymity with respect to Adv if: 7. 
Adv cannot observe encrypted interests coming from u 
and u', or it cannot distinguish between the two con- 
sumers due to anonymity provided by the network layer; 
or 2. u, u' share an non-compromised first router in at 
least one ephemeral circuit; or 5. u, u' share an non- 
compromised second router in at least one ephemeral 
circuit. 

Similarly to consumer anonymity, producer 
anonymity is defined in terms of indistinguishable 
configurations. In particular, a producer p enjoys 
anonymity with respect to Adv which observes 
intpfej^pfej if Adv cannot distinguish between a configu- 
ration C where p produces the content corresponding to 
int and a configuration C where p' and not p produces 
that content. 

Definition 5.2 (Producer anonymity). Given intp^j 
for p G P, u G C has producer anonymity in configu- 
ration C with respect to p, Adv if there exists an indis- 
tinguishable configuration C such that intpfc^^p^j is sent 
by a non-compromised consumer to a producer different 
from p. 

Tlieorem 5.2. u has producer anonymity in C with re- 
spect to p, Adv if any of the following conditions hold: 

1. There exists C{u) such that Ci(u) (the first 
anonymizing router) is not compromised and 
C4(m) = intpA;i,pfc2, C'i(w) = Ci(w') and Ci{u) = 
p ^ C3{u') for some non-compromised u' G C, or 

2. There exists C{u) such that C2(u) (the sec- 
ond anonymizing router) is not compromised and 
Ci{u) = intpfei^pfej, C2(m) = C2(m') andCa^u) = 
p ^ C3{u') for some non-compromised u' G C 

Finally, we define producer and consumer unlinkability 

as: 

Definition 5.3 (Producer and consumer unlinkability). 

We say that m G (C \ QAdv) and p G P are unlinkable in 
C with respect to Adv if there exists C =Adv G where 
u 's interests are sent to a producer p' ^ p. 
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Corollary 5.1. Consumer u G {C\CAdv) ond producer 
p € P are unlinkable in configuration C with respect 
to Adv if p has producer anonymity with respect to u's 
interests or u has consumer anonymity and there exists 
a configuration C =Adv C where C'{u') = C(u) with 
u' ^ u and u"s interests have a destination different 
from p. 

Corollary 5.2. Consumer u G {C\CAdv) ond producer 
p E P are unlinkable in configuration C with respect to 
Adv if both producer and consumer anonymity hold. 

We emphasize that this result also holds for 
ephemeral circuits with length greater than two ARs. 

6 Implementation and Performance 

ANDaNA is implemented as an application-level ser- 
vice consisting of client "stack" (used by consumers) 
and server program that runs on ANDaNA ARs. Both 
are written in C and interface to NDN through Unix 
domain sockets]^ Cryptographic algorithms are imple- 
mented using OpenSSL |42|. Hybrid encryption is ob- 
tained using RSA-OAEP [lO] and AESh-HMAC HsIITI. 
The latter is also used for symmetric encryption. We 
use SHA-256 for HMAC and 1024- and 128-bit keys 
for RSA and AES, respectively. Loose time synchro- 
nization among ANDaNA client and servers are achieved 
using pool . ntp . org, a public pool of NTP servers. 

ANDaNA client encrypts interests from user appli- 
cations. In order to hide all possible sources of de- 
anonymizing information, encryption is performed over 
the full interest packet, including; name, scope, exclu- 
sion filters and duplicate suppression string fields. Fol- 
lowing NDN "rules", ANDaNA AR announces the abil- 
ity to serve the root ("/") namespace and receives all 
traffic sent from (or to) the local NDN routing process. 
This allows traffic to be routed through ANDaNA by 
default, requiring no changes to existing applications. 
For more granularity, consumers can vary the default 
namespace, e.g., "/andana/". However, this would 
require privacy-seeking applications to explicitly direct 
their traffic to that namespace, similar to today's config- 
urable proxy settings. 

ANDaNA servers run as appUcations on NDN routers. 
Each server is responsible for its relay and session cre- 
ation namespaces. The former is a globally routable 
namespace used for receiving both session-based and 
asymmetrically encrypted Interests. Clients using 
session-based encryption in ANDaNA need to first es- 
tablish symmetric keys with servers. To start a new 
session with a server, a clients sends an interest in the 
createsession namespace, registered by the server 
code as a sub-prefix of the relay namespace. 

*At the time of this writing, there is no direct function interface to 
NDN 



We deployed our prototype and run a series of tests 
on the Open Network Laboratory (ONE) |[34l. ONE is a 
testbed developed by Washington University to enable 
experimental evaluation of advanced networking con- 
cepts in a realistic environment. To guarantee highly re- 
producible results, ONE provides reservation-based ex- 
clusive access to most of its host and network resources. 
All our experiments used single-core Einux machines 
with 512 MB of RAM and gigabit switches (one ma- 
chine per switch). 

We compare plain NDN and ANDaNA on a simple 
line topology with four switches and four Einux ma- 
chines, each corresponding to an NDN node. Static 
routing is established between nodes. The first NDN 
node in the line topology acts as a consumer and runs 
ccngetf ile — a small tool from CCNx open-source 
library that retrieves data published as NDN content and 
stores it in a local file. We performed tests with 1, 10, 
and 100MB files; each file was retrieved from the NDN 
repository of the machine at the other end of the line 
topology. Results of this comparison for 10MB files are 
summarized in Fig.[T] Due to space constraints, we illus- 
trate all file retrieval results in AppendixlB] Results show 
that computational overhead introduced by ANDaNA 
roughly doubles download times over plain NDN. This 
is assuming an almost-perfect world where ARs topo- 
logically align with the best path and link bandwidths 
are abundant. 

In order to compare ANDaNA's computational over- 
head with a similar anonymizing tool, we deployed Tor 
over ONE and measured its overhead over TCP/IR We 
measured performance of TCP/IP baseline deploying 
five switches, connected in a line, and two Einux ma- 
chines (one at each end): the first acting as client (run- 
ning curl), the second - as server (running lighttpd 
HTTP server). Performance of Tor was measured on a 
topology that closely mimics that of TCP/IP baseline: 
five switches, connecting three Tor relays, a client and a 
server. To ensure "line" topology. Tor client is config- 
ured to use explicit entry and exit nodes; DNS lookups 
are avoided by using IP addresses in all tests. 

Before discussing the results, we mention some com- 
parison details. NDN is a research project and its code 
is optimized for functionality rather than performance. 
It provides content authentication through digital signa- 
tures - a computationally expensive feature not present 
in either TCP/IP or Tor. NDN stack currently runs as a 
user-space application, in contrast to TCP/IP that runs 
in kernel-space. Finally, in all our experiments, NDN 
had to run on top of TCP/IP (rather than at layer 2) due 
to limitations of the underlying ONE testbed. Conse- 
quently, we believe a fair comparison between ANDaNA 
and Tor can only be achieved by focusing the analysis 
on relative overhead imposed by each, over the network 



11 





120 




100 




80 


E 


60 






QC 


40 




20 








6 8 10 12 
Start Time (s) 



8 10 12 
Start Time (s) 



14 16 If 




14 16 If 



Figure 1 . Left: RTT for 10MB of content over NDN 
(limited anonymity). Right: RTT for lOMB of content 
over ANDaNA (full anonymity). 
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Figure 2. Comparison of 1, 10, and 100MB file down- 
load times over Tor, ANDaNA-S and AN DIN A- A with 
respect to respective basehnes. Left: transfer time and 
circuit setup time. Right: transfer time only. 
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it is deployed, i.e., NDN and TCP/IP respectively. 

Figure |2] shows the performance of ANDaNA and 
Tor with respect to their baselines. The graph on the 
left shows the measurements including the time re- 
quired to setup a Tor circuit and all ephemeral cir- 
cuits for ANDaNA. Session-based ANDaNA is denoted 
by ANDaNA-S, while ANDaNA with asymmetric encryp- 
tion is referred to as ANDaNA-A. For small- to medium- 
size files (1-lOMB), overhead of ANDaNA-A is between 
1.5 X and 1.75 x. As expected, ANDaNA-S exhibits 
lower overhead (1.45 x to 1.7 x) due to more efficient 
symmetric encryption. 

In comparison. Tor's download time for the same 
amount of data is between 2.3 and 7 times higher than 
that of TCP/IP. This imposes significant overhead for 
content size that fits many typical web pages. Whereas, 
ANDaNA is efficient in anonymizing such traffic pat- 
terns. Large file transfers are more efficient with Tor, 
which increases the total download time by about 1.4 
times, compared to 2.4 and 2.1 of ANDaNA-A and 
ANDaNA-S. 

The right-side graph in Figure |2] shows the rela- 
tive speed of three approaches without including circuit 
setup time. Our measurements show that overhead of 
ephemeral circuit creation in ANDaNA-S is negligible. 



Since a new ephemeral circuit must be selected for ev- 
ery interest with ANDaNA-A, we simply report the same 
values from the previous graph. Results confirm that 
overhead of circuit creation in Tor is significant when 
retrieving small-size content. Removing this initializa- 
tion phase from the measurements significantly reduces 
Tor's overhead. However, the overhead of ANDaNA with 
respect to its baseline is still smaller than that of Tor for 
content up to 10MB. 

In absolute terms (comparing raw download times). 
Tor + TCP/IP performs better than ANDaNA + NDN in 
our testbed experiments. However, we believe that, in 
a realistic geographically-distributed deployment setting 
with limited-bandwidth links, ANDaNA + NDN would 
provide a significant performance advantage over Tor -i- 
TCP/IP due to its shorter (ephemeral) circuits. In other 
words, we anticipate that shorter circuits and content 
caching in ANDaNA -i- NDN would result in apprecia- 
bly lower overall download times than Tor -i- TCP/IP in 
a global internet setting. 

7 Conclusions and Future Work 

Content-centric networking is a major transition from 
today's world that focuses on communication end- 
points. NDN project represents one of the most visible 
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current research efforts aiming to bring content-centric 
networking into the foreground by using it as a possi- 
ble future Internet architecture. Despite some privacy- 
friendly features and side-effects, NDN poses some in- 
teresting privacy challenges. This work presents an ini- 
tial attempt to provide anonymity in NDN. The main 
contribution of this work is threefold: (1) exploration 
of privacy issues in NDN, (2) design of an anonymiza- 
tion tool - ANDaNA, and (3) its security analysis and 
performance assessment. 

At the same time, particularly because the entire 
NDN project (and, of course, ANDaNA) represent work- 
in-progress, one of the main goals of this paper is to so- 
licit comments from the security research community. 
Also, since our work merely scratches the surface of pri- 
vacy issues in content-centric networking and NDN, a 
number of issues are left for future work, including: 

• More performance experimentation with ANDaNA, 
especially, in larger testbeds and under various traf- 
fic load / congestion scenarios. (This should lead to 
better code profiling and lower overhead.) 

• Comprehensive directory service for effective 
large-scale distribution of up-to-date AR informa- 
tion. 

• In-depth study of both privacy and performance 
trade-offs in the use of asymmetric vs. symmetric 
ANDaNA variants. 

• DoS mitigation measures, such as computational 
puzzles for circuit establishment. 

• Red-teaming experiments to assess realistic privacy 
attainable with ANDaNA. 

• Modification of ANDaNA to support other emerg- 
ing content-centric architectures and comparative 
experiments among them. 
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A Security Proofs 

Justification of Claim \5.1\ Suppose that Claim 15.11 is 
false. Then, Adv can be used to construct an algorithm 
Sim that breaks the CPA-secure encryption scheme £ 
as follows: Sim plays the CPA-security game with a 
challenger, that selects a public key pk. Sim selects a 
public key pk2 and initializes Adv, that eventually re- 
turns two interests int'^,int^ of its choice. Sim sends 
Co = i^^pfeg (i'^t") ™d ci = £pfe2(int^) to the challenger, 
that returns c* = £pk{cb) ~ '?pfc(fpfc2 ('"t''))- Sim sends 
(c*,co,ci) to the challenger that eventually returns its 
choice h' . Sim outputs h' as its choice. The output of 
Sim is 6' = 6 iff Adv guesses b' correctly. Since Adv 
guesses h' correctly with non negligible advantage over 
1/2, Sim breaks the CPA-security of 8 with non negli- 
gible advantage. This violates the hypothesis of Claim 
15.11 and, therefore, such Adv cannot exist. □ 

Proof of Theorem 15. 1 1 — Consumer Anonymity (sketch). 
We prove that each condition in Theorem 15 . 1 1 implies 
consumer anonymity: 

1 . Assume that, for each u' ^ u there exists no con- 
figuration C =Adv C with respect to Adv such 
that C'{u') = C{u). Adv cannot determine that 
C{u) ^ C using only C2{u), Cz{u) and Ci{u): if 
Ci{u) — C'i{u') for some C =Adv C and u' (i.e. 
there exist an indistinguishable configuration with 
respect to Adv where a consumer different from u 
sends an interest to Ci {u) through interface if 

and w, u' G A.^^ci(m)), then there must exist a tu- 
ple C'{u') = C{u) since (a possibly compromised) 
r cannot process interests coming from consumers 
in the same anonymity set differently - that would 
imply that they are not in the same anonymity set. 
Therefore, for each configuration C" =Adv C, and 
for each u' ^ u 3C[{u') = Ci{u) 3C'{u') = 
C[u). 

For this reason, C((u') 7^ Ci{u) for all C =Adv C 
and for all u' ^ u, i.e. VC((u') = Ci{u).C{u) ^ 
C". This is true if and only if Adv controls at 
least one interface if[ G path'^'''-"^ for which u' 
is not in the anonymity set of if^, i.e., 3if[ £ 

tradicts the hypothesis, there must exist a configu- 
ration C" indistinguishable from C with respect to 
Adv such that C'{u') = C{u). 

2. We assume that, for each u' ^ u, Adv can 
distinguish between interests from u from 
those from u' (i.e., condition 1 of theorem 
15.11 does not hold). We show how to prove 
theorem 15.11 by reduction. Assume that there 
exists an efficient adversary Adv such that 
'^Adv = C \ {u,u'} and RAdv = R \ {n} (i-e., 
Adv compromised all entities, except u, u' and ri). 



Suppose that C{u) = (ri, r2,p, intp^.^ ^^.J, 
C{u') = (ri,r^,p',intp;,^^p^) for some 
r2, r2,p,p', int'', int^. For each C", Adv out- 
puts: 1 on input of C and on input of C" with 
non-negligible probability, where C'{u) — C'(u') 
and C'{u') — C{u). In other words, there is no 
configuration for which C =Adv C holds. We 
sketch how Adv can be used as a subroutine in a 
simulator Sim that breaks Claim lSTI 
Sim creates a random network topology and in- 
puts it to Adv. Sim also inputs the information that 
Adv would obtain by compromising all entities in 
except u, u' and ri. As such, Sim also includes 
intpj.^ p^., and int?^, intp^,^ received from the chal- 
lenger of Claim dTI to the input of Adv. Then, Sim 
sends to Adv configurations C and C", where C 
is identical to C, except that C{u) = C'{u') and 
C{u') ^ C'{u), and C{u) ^ C{u'). We have that 
6 = 1 iff Adv outputs 1. Since existence of Sim 
violates Claim [STTl Adv cannot exits. 
3. We assume that, for each u' ^ u, Adv can 
distinguish between interests from u from those 
from u' (i.e., condition 1 of theorem 15.11 does 
not hold) and that the first router in us and 
u"s paths is compromised, i.e., condition 2 
of theorem 15.11 does not hold. We then prove 
theorem 15.11 by reduction. Assume that there 
exists an efficient adversary Adv such that 
C^di. = C \ {u,u'} and RAdv = R \ {''2} (i-e., 
Adv compromised all entities, except u, u' and r2). 
Suppose that C{u) = (ri, r2,p, int°fe^_pfcj, 

C{u') = (?-i,''2,J5',intpfc'^,pfcJ for some 
ri, r5^,p,p', int'', int^. For each C", Adv out- 
puts 1 on input of C, and on input of C", where 
C'{u) = C(u') and C'{u') = C{u). In other 
words, there is no configuration where C =Adv C 
holds. We sketch how Adv can be used as a 
subroutine in a simulator Sim to determine, given 
intpfe^ and intp^,,, whether int = int'. 
Sim creates a random network topology and in- 
puts it to Adv. Sim also inputs the information 
that Adv would obtain by compromising all enti- 
ties in A^ except for u,u' and 7-2. Sim interacts 
with the challenger of Claim 15.11 setting the in- 
nermost key of its challenge, denoted as pk2, to 
_L. Sim receives intp^.^ for some int'^,int^ of its 
choice, and adds int'', -;— , int^^— and int^r— to the 
input of Adv. Then Sim sends to Adv configu- 
rations C and C", where C is identical to C ex- 
cept that C{u) = C'{u') and C{u') = C'{u), and 
C{u) 7^ C(u')- We have that 6 = 1 iff Adv outputs 
1. Since the existence of Sim would violate Claim 
15.11 Adv cannot exits. □ 



15 



Proof of Theorem \5 .2\ — Producer Anonymity ( sketch). 
We prove that each condition in Theorem 15.21 implies 
producer anonymity: 

1. Let C4{u') = ^<^^pki.pk' 

be identical to C except that C'{u) = 
(Ci(u),C2(w),C3(m),C4K)) and C'{u') = 
(Ci(u'),C2(m'),C'3(u'),C4(w)). In other words, 
C" is a configuration where \ntpki,pk2 is sent to 
a producer different from p. In this setting, Adv 
can only distinguish C" and C by distinguishing 
C'{u) and C'{u'). Claim \5A\ guarantees that 
Adv that observes \ntpki,pk2 and intpj.^ pj,/ cannot 
determine which corresponds to int and which - 
to int'. Moreover, Assumption 15 . 1 1 prevents Adv 
from linking the output of non-compromised router 
Ci(u) with \ntpk^^pk2 and intpj.^ pj.^. Therefore, 
C =Adv C . 

2. Similarly, let Cn{u') = intp^.^ p^^^ and let 
C be identical to C except that C'{u) — 
{Ci{u),C2{u), C^{u),Ci{u')) and C'{u') = 
{Ci{u'),G2{u'),C3.{u'),Ca{u)). We assume that 
Ci(u) and Ci(w') are compromised. In this set- 
ting, Adv can only distinguish between C and C 
by distinguishing C"(u) and C'{u'). Claim ISTI 
guarantees that any Adv that observes intpfc^ pfe2 
and intpj.^ cannot determine which corresponds 
to int and which - to int'. Moreover, Assumption 
15. II prevents Adv from linking the output of non- 
compromised router C2{u) with intp/j^ and intpj,^. 
Therefore, C =Adv C . □ 
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B Performance Evaluation: Additional Results 
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Figure 3. Round trip time for transferring 1, 10 and 100MB of content over NDN (limited anonymity) 



17 



E 



E 



DC 



E 



DC 



120 
100 
80 
60 
40 
20 


120 
100 
80 
60 
40 
20 




0.5 1 1.5 2 
Start Time (s) 



2.5 




4 6 8 10 12 14 16 18 
Start Time (s) 




20 40 60 80 100 120 140 160 
Start Time (s) 

Figure 4. Round trip time for transferring 1, 10 and 100MB of content over ANDaNA (full anonymity). 
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